Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

**Pull Request Title:** Integrate OpenAI Whisper API Support and Service Abstraction Layer (Fixes #137) #138

Conversation

agentmarketbot
Copy link
Contributor

Pull Request Description

Overview

This pull request addresses Issue #137: Support Additional Whisper Services (Focus on OpenAI Whisper API) by integrating the OpenAI Whisper API as a transcription backend alongside the existing AWS Whisper. This enhancement improves the flexibility and options available to users for voice message transcription within the Telegram bot.

Key Changes

  1. Transcription Service Architecture:

    • Introduced an abstract base class TranscriptionService that establishes a common interface for all transcription services. This includes:
      • An abstract method transcribe_audio(file_url: str) for implementing specific transcription logic.
      • A helper method _download_audio(file_url: str) to standardize audio file downloads and reduce code duplication.
  2. AWS Integration:

    • Modified the existing AWSTranscriber class to inherit from the new TranscriptionService, thus standardizing its interface with the new architecture.
  3. OpenAI Integration:

    • Developed a new OpenAITranscriber class that encapsulates logic for utilizing the OpenAI Whisper API. This new module handles:
      • Transcription requests to OpenAI’s API.
      • Audio file processing required for proper API interaction.
  4. Factory Pattern Implementation:

    • Implemented a TranscriptionServiceFactory to simplify the instantiation of the appropriate transcriber. It allows users to create instances of either AWSTranscriber or OpenAITranscriber based on their specified preferences.
  5. Usage Examples:

    • The updated code provides clear examples for transcribers:
      • For AWS:
        aws_services = AWSServices()
        transcriber = TranscriptionServiceFactory.create_service('aws', aws_services=aws_services)
      • For OpenAI:
        transcriber = TranscriptionServiceFactory.create_service('openai', api_key='YOUR_API_KEY')
      • Both services are invoked using:
        transcript = transcriber.transcribe_audio(file_url)
  6. Documentation Updates:

    • Extensive documentation has been added to guide users on configuring and using both transcription services within the bot, ensuring a smooth transition to the new capabilities.

Acceptance Criteria

  • The bot now supports transcription via both AWS Whisper and OpenAI Whisper APIs.
  • Users can easily configure their preferred transcription service.
  • Comprehensive test coverage and updated documentation are in place.

Additional Context

The integration of the OpenAI Whisper API provides an alternative transcription option known for its accuracy across various languages and accents, potentially enhancing user experience and transcription quality.

Fixes #137

Thank you for reviewing this pull request! Please let me know if you have any questions or require further adjustments.

@agentmarketbot
Copy link
Contributor Author

Aider:

Update Summary for PR #138: OpenAI Whisper API Integration

We have successfully implemented the changes required to support the OpenAI Whisper API as an additional transcription service for the Telegram bot, addressing Issue #137. Here’s a concise overview of the modifications made:

  1. Transcription Service Architecture:

    • Created an abstract base class TranscriptionService.
    • Implemented two concrete classes: AWSTranscriber for AWS Whisper and OpenAITranscriber for OpenAI Whisper.
    • Introduced a TranscriptionServiceFactory to allow easy switching between these services.
  2. Configuration Updates:

    • Added support for the OPENAI_API_KEY environment variable.
    • Introduced the TRANSCRIPTION_SERVICE variable to allow users to select either 'aws' or 'openai' as their transcription service.
  3. Bot Handlers Update:

    • Replaced the old AudioTranscriber references with the factory-based approach.
    • Adjusted the service initialization to utilize the new factory.
    • Updated processing of voice messages to incorporate the new transcription service.

User Instructions:

  • To utilize the OpenAI Whisper service, users should:
    1. Set the environment variable TRANSCRIPTION_SERVICE=openai.
    2. Provide their OpenAI API key in the OPENAI_API_KEY environment variable.

By default, the AWS Whisper service will continue to be used unless otherwise specified.

These changes enhance flexibility, allowing users to choose between AWS and OpenAI Whisper services, ensuring a robust and adaptable transcription solution.

@vadanrod14 vadanrod14 closed this Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Additional Whisper Services (Focus on OpenAI Whisper API)
2 participants